<h1><a href="#data-appendix-indian-pharmaceutical-patent-prosecution-the-changing-role-of-section-3d" id="data-appendix-indian-pharmaceutical-patent-prosecution-the-changing-role-of-section-3d">Data Appendix: Indian pharmaceutical patent prosecution: The changing role of Section 3(d)</a></h1>
<h1><a href="#introduction" id="introduction">Introduction</a></h1>
<p>This Appendix aims to provide Stata code to fully reproduce all results in the paper, using the data files available from Bhaven Sampat&rsquo;s Dataverse at <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FKNWKIV">https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FKNWKIV</a>.</p>
<p>This is a dynamic Stata document (<a href="https://www.stata.com/new-in-stata/markdown/">https://www.stata.com/new-in-stata/markdown/</a>), meaning that running the command <code>dyndoc &quot;code/Data Appendix.md&quot;, replace</code> after creating the directory structure below and copying the files from Dataverse to the appropriate directories should fully reproduce the results in the paper. (If using the Dataverse files, download the <code>src</code> and <code>byhand</code> files in original format and place in those directories. Two packages <code>tabplot</code> and <code>scheme_tufte</code> are required. The output from the <code>dyndoc</code> command (reproducing all output in the paper) is written to <code>Data Appendix.html</code>.</p>
<p>The <code>dyndoc</code> command will run in Stata 15.0 or later versions. For those without Stata 15.0 the code from this document can be pasted into a .DO file for any recent version of Stata to reproduce the results as well. We also include CSV version of the final analysis file for those who prefer to analyze that directly, even using different software.</p>
<p>In the &ldquo;Background and Setup&rdquo; section below we disuss data sources and preparation. If you are interested only in the final files used for analyses and to produce the figures and other output in the paper, please skip to the &ldquo;Analysis&rdquo; section that follows.</p>
<h1><a href="#background-and-setup" id="background-and-setup">Background and setup</a></h1>
<p>If you wish to run the <code>dyndoc</code> command to reproduce, create the following directories and populate the src, byhand and code directories in the directories indicated on Dataverse.</p>
<table>
<thead>
<tr><th>  Directory </th><th> Description   </th></tr>
</thead>
<tbody>
<tr><td>src  </td><td> raw OECD TPF  data  </td></tr>
<tr><td> byhand   </td><td> files coded by hand   </td></tr>
<tr><td> processed   </td><td> final datasets used for analysis </td></tr>
<tr><td> tmp  </td><td> temporary datafiles generated during processing   </td></tr>
<tr><td> code </td><td> this data document and/or individual DO files </td></tr>
</tbody>
</table>
<p>We began with the OECD Triadic Patent Families (TPF) database (September 2015) which  covers patent applications filed to the EPO, the JPO and the USPTO that share the same set of priorities. (The OECD database is a subset of the from the EPO PATSTAT database, with various value added fields. We downloaded the OECD TPF from the public FTP site here: <a href="ftp://prese:Patents@ftp.oecd.org/">ftp://prese:Patents@ftp.oecd.org/</a>. Since this site does not store archival versions, we include the relevant tables in the ZIP file provided above.</p>
<p>Specifically, we used data from three tables from the TPF:</p>
<ol>
<li><code>201509_TPF_Core.txt</code></li>
<li><code>201509_TPF_PCT.txt</code></li>
<li><code>201509_TPF_IPC.txt</code></li>
</ol>
<p>The unit of analysis is a patent family, denoted by the variable <em>family_id</em>.  The analyses focus on applications that met the following criteria:</p>
<ol>
<li>Have main international patent class (IPC) of &ldquo;A61K&rdquo; or &ldquo;C07D&rdquo; (to focus on pharmaceutical applications)</li>
<li>Have priority filing years between 2002 and 2012</li>
<li>Have a PCT filing</li>
<li>Do not have Derwent patent codes associated with biologic drugs or non-pharmaceutical inventions</li>
<li>Have a least one Indian national stage application</li>
<li>Have priority month of July</li>
</ol>
<p>Below, we discuss each of these.</p>
<h2><a href="#creating-an-extract-from-tpf-dataset" id="creating-an-extract-from-tpf-dataset">Creating an extract from TPF dataset</a></h2>
<p>We began by converting the three TPF files to Stata format:</p>
<pre><code>. 
. /* Setup: This linewidth helps make the final output files look nicer */
. 
. set linesize 80

. 
. /* Import the TPF_Core, TPF_IPC, and TPF_PCT files and save to temporary direc
&gt; tory */
. 
. foreach i in Core IPC PCT {

. 
</code></pre>
<p>Next, we worked with these files to isolate applications with IPC code A61K and/or C07D, priority filing years 2000 to 2012, and a PCT filing:</p>
<pre><code>. 
. /* Keep the applications with one of the two IPC codes anywhere */
. 
. use tmp/TPF_IPC, clear

. keep if index(ipc,&quot;A61K&quot;)==1 | index(ipc,&quot;C07D&quot;)==1

. 
. /* Keep unique application identifier */
. 
. bysort family_id: keep if _n==1

. keep family_id

. 
. /* Save these to temporary directory to be used later */
. 
. save tmp/family_to_ipc, replace

. 
. /* Keep the applications with priority years 2000 - 2012 */
. 
. clear

. use tmp/TPF_Core, clear

. 
. /* Generate a priority year variable and keep in range */
. 
. qui: tostring first_prio, gen(prio_string)

. gen prioyear = real(substr(prio_string,1,4))

. keep if prioyear&gt;=2000 &amp; prioyear&lt;=2012

. 
. /* Merge in matching applications from IPC A61K / C07D set */
. 
. joinby family_id using tmp/family_to_ipc, unmatched(master)

. keep if _merge==3

. drop _merge

. 
. /* Keep only the applications with a PCT filing */
. keep if count_pct~=.

. 
. /* get the PCT number */
. 
. keep family_id first_prio

. joinby family_id using tmp/TPF_PCT, unmatched(master)

. assert _merge==3

. 
. /* keep family_id, pct_number, priority_date */
. 
. keep family_id pct_nbr first_prio

. 
. save tmp/TPF_extract, replace

. 
</code></pre>
<h2><a href="#filtering-by-dwpi-class" id="filtering-by-dwpi-class">Filtering by DWPI class</a></h2>
<p>We learned from previous research (Sampat and Shadlen 2017) that filtering by IPC class was a good first step, but that many resulting applications were not drug-related, or related to biologic drugs. Since our interest is in small molecule drugs, we filtered further using data from the Derwent World Patent Index.</p>
<p>Specifically, we searched for each PCT application in the Derwent World Patents Index, accessed through Thomson Reuters (now Clarivate):</p>
<p><a href="https://clarivate.com/products/dwpi-reference-center/">https://clarivate.com/products/dwpi-reference-center/</a></p>
<p>We downloaded the DWPI manual code for each application, and isolated applications that were drug related (having DWPI category &ldquo;B&rdquo;) but not biologic (not having DWPI code B-04E, D05-H, B04-F, B-04G). The resulting applications are in the file <code>byhand/dwpi_pharma.csv</code>. About half of the original dataset remains after this filter:</p>
<pre><code>. /* Import the DWPI search results: these are &quot;pharma&quot; apps left after DWPI fil
&gt; tering */
. 
. insheet using byhand/dwpi_pharma.csv, comma names clear

. 
. /* Merge back the TPF file */
. 
. joinby pct_nbr using tmp/TPF_extract, unmatched(both)

. tab _merge

. keep if _merge==3

. drop _merge

. save tmp/TPF_extract_pharma, replace

. 
</code></pre>
<h2><a href="#obtaining-indian-national-stage-applications" id="obtaining-indian-national-stage-applications">Obtaining Indian national stage applications</a></h2>
<p>The next step was to collect information on national phase application numbers. There are several ways to do this, including searching the Indian Patent Office database &laquo;http://ipindiaservices.gov.in/publicsearch&raquo; or Patent Scope &laquo;https://patentscope.wipo.int/search/en/search.jsf&raquo; databases by hand.</p>
<p>Rather than search by hand, we provided our PCT numbers to WIPO who produced for us corresponding Indian application numbers from the WIPO Statistics Database. We verified this was essentially identical to that which would have been available from searching Patent Scope or Patent Scope as of late-2016, when we conducted the searches. These applications are in the file <code>byhand/wipo_india.csv</code>.</p>
<p>We obtained these data from the WIPO Statistics Database. Unfortunately the Indian national data in that file is not complete for PCT applications filed after 2012, so we focus on PCT applications filed until the end of 2011.</p>
<pre><code>. /* Import WIPO national stage applications for applications in our sample */
. 
. insheet using byhand/wipo_india.csv, comma names clear

. joinby pct_nbr using tmp/TPF_extract_pharma, unmatched(both)

. tab _merge

. /* Keep only those from TPF_extract_pharma which have an indian national phase
&gt;  application */
. 
. keep if _merge==3

. 
. /* Generate a PCT year based on the PCT number, and keep pre-2011 only */
. 
. gen year = real(substr(pct,3,4))

. keep if year&lt;=2011

. drop year _merge

. save tmp/TPF_extract_pharma_india, replace

</code></pre>
<p>Finally, to keep the sample size tractable (since our coding of applications and outcomes below is done by hand) we focused on applications where July was the priority month:</p>
<pre><code>. use tmp/TPF_extract_pharma_india, clear

. tostring first_prio, replace force 

. replace first_prio = substr(first_prio,1,4) + &quot;/&quot; + substr(first_prio,5,2) + &quot;
&gt; /&quot; + substr(first_prio,7,.)

. 
. gen priodate = date(first_prio,&quot;YMD&quot;)

. format %d priodate

. 
. gen prioyear=year(priodate)

. gen priomonth = month(priodate)

. keep if priomonth==7

. save tmp/finalextract, replace

</code></pre>
<p>This left 1,993 PCT applications with an Indian national stage application.</p>
<h2><a href="#coding-the-applications-as-primary-or-secondary" id="coding-the-applications-as-primary-or-secondary">Coding the applications as primary or secondary</a></h2>
<p>We provided the PCT applications to a patent attorney with expertise in drug patents, and asked her to code the claims according to the coding guide described in Sampat and Shadlen (2017), which in turn was adapted from Hemphill and Sampat (2011, 2012). In general we coded each patent as to whether it contained the following types of claims:</p>
<ul>
<li>A: active ingredient (see specific descriptions of A1-A4 below)</li>
<li>B: formulation or composition</li>
<li>C: method of use</li>
<li>D: other, but related to the drug</li>
<li>E: biologic</li>
</ul>
<p>A patent application can, and often does, include more than one category of claims. We also included a category &ldquo;Z&rdquo; for applications that were not actually drug related.</p>
<p>For active ingredient claims, we distinguished the four subcategories:</p>
<ul>
<li>A1: active ingredient.</li>
<li>A2: is for polymorphs or other crystal forms.</li>
<li>A3: is for enantiomers or other isomers.</li>
<li>A4: salt, metabolite, or intermediate. Also pre-metabolites and derivatives</li>
</ul>
<p>The file <code>byhand/claimscoding.csv</code> includes coding of each of these applications.</p>
<p>We dropped all applications with only pure product claims (category &ldquo;D&rdquo;) or those with only biologic claims (&ldquo;E&rdquo;) or non-pharma applications (&ldquo;Z&rdquo;). In practice, the DWPI filtering worked well so there were only a small number of E and Z applications in the set.</p>
<p>We categorized as secondary applications without at least one &ldquo;A1&rdquo; chemical compound claim. This is thus a narrow definition of primary applications, i.e. excludes those with polymorph/crystalline structure claims (A2), enantiomer and stereoisomer claims (A3), salts, metabolites, intermediates, pre-metabolites, and derivatives (A4), <em>unless they also have an A1 claim</em>.</p>
<p>The resulting file, including is <code>tmp/finalextract-coded.dta</code>.</p>
<pre><code>. insheet using byhand/claimscoding.csv, clear comma

. keep if  (a1_final + a2_final + a3_final + a4_final + b_final + c_final) &gt;0

. gen secondary = a1_final1==0

. keep pct_nbr secondary

. joinby pct_nbr using tmp/finalextract, unmatched(master)

. assert _merge==3

. drop _merge

. keep pct_nbr secondary national_number prioyear

. 
. save tmp/finalextract-coded, replace

</code></pre>
<h2><a href="#determining-indian-outcomes-for-the-applications" id="determining-indian-outcomes-for-the-applications">Determining Indian outcomes for the applications</a></h2>
<p>We also searched for the Indian outcomes for each application on the Indian patent office website &laquo;https://ipindiaservices.gov.in/publicsearch&raquo;. The file <code>byhand/ipo-outcomes.csv</code> contains status, aggregated to five categories (Abandoned, Granted, Pending, Refused, Withdrawn).</p>
<h2><a href="#assessing-role-of-3d-from-fers" id="assessing-role-of-3d-from-fers">Assessing role of 3(d) from FERs</a></h2>
<p>Next, used the Indian patent office database &laquo;https://ipindiaservices.gov.in/publicsearch&raquo; to locate the first examination reports for the applications. Specifically, we searched the Application Status tab for each application, then used information in the &ldquo;View Documents&rdquo; or &ldquo;View Examination Report&rdquo; tabs to locate the FER.</p>
<p>For applications that had an FER available, we read the FER and determined if there was (1) any 3(d) objection and (2) a 3(d) objection on claim 1. The file <code>byhand/fer_coded_3d.csv</code> contains the coding. For applications with an FER, the field <em>claim1_3d</em> is blank if there is no 3(d) objection in the FER, 0 if there is a specific 3(d) objection but not on claim 1, and X if there is a general 3(d) objection not specific to any claim. As explained in the text, often the FERs are vague, referring generally to &ldquo;claims&rdquo; not meeting the standards of Section 3(d), without specifying which claims. These are coded as &ldquo;X&rdquo;. However, where FERs refer to all of the claims in an application (e.g. &ldquo;the claims,&rdquo; &ldquo;the present claims,&rdquo; or &ldquo;the claims in this invention&rdquo;) these are coded as 1.&quot;</p>
<p>For 2005-2007 applications with FERs (almost all of which were electronic, making them easier to code) we also coded whether and where there was a novelty or inventive step objection.  The file <code>byhand/fer_coded_3d_detailed.csv</code> contains these codings. The fields <em>claim1_novinv</em> and <em>anynovinv</em> are coded analogously to <em>claim1_3d</em>  above.</p>
<p>For applications from 2005-2007 that were granted we also determined the grant lag based on certificate of issue date  and application date (based on information from the Application status tab). These are collected in <code>byhand/fer_coded_3d_detailed_grants.csv</code>.</p>
<p>We converted each of the input files generated byhand to Stata, so they could be used in the analyses.</p>
<pre><code>. /* Convert all handed coded files to Stata  to merge with the final set of cod
&gt; ed applications */
. 
. insheet using byhand/fer_coded_3d.csv, names comma clear

. rename inputnational_number national_number

. save tmp/fer_coded_3d, replace

. 
. insheet using byhand/fer_coded_3d_detailed.csv, names comma clear

. rename inputnational_number national_number

. save tmp/fer_coded_3d_detailed, replace

. 
. insheet using byhand/fer_coded_3d_detailed_grants.csv, names comma clear

. rename inputnational_number national_number

. save tmp/fer_coded_3d_detailed_grants, replace

. 
. insheet using byhand/ipo-outcomes.csv, names comma clear

. rename inputnational_number national_number

. save tmp/ipo-outcomes, replace

. 
</code></pre>
<p>Finally, we merged these files containing application outcomes, codings of objection types for applications with FER, coding of detailed objection types for 2005-7 applications with FERs, and information on grant lags for the granted applications from the 2005-7 cohort with the basic application file (<code>finalextract_coded</code>). We also added variable and value labels to create the final dataset to be used in the analyses, <code>processed/india3d.dta</code>. An CSV file with the same name is also produced for those who wish to reproduce using different software.</p>
<pre><code>. use tmp/finalextract-coded, clear

. 
. label var pct_nbr                       &quot;Application number&quot;

. label var secondary             &quot;Secondary (1=yes)&quot;

. label var national_number       &quot;Indian application number&quot;

. label var prioyear                      &quot;Priority year&quot;

. 
. /* Merge in outcomes */
. 
. joinby national_number using tmp/ipo-outcomes, unmatched(master)

. assert _merge==3

. drop _merge

. 
. /* Merge in overall FER coding */
. 
. joinby national_number using tmp/fer_coded_3d, unmatched(master)

. assert _merge==3

. drop _merge

. 
. gen any3d = claim1_3d~=&quot;&quot;

. 
. label var anyfer                        &quot;Has FER&quot;

. label var any3d                 &quot;Any 3d objection?&quot;

. 
. /* Merge in 2006-7 detailed FER coding */
. 
. joinby national_number using tmp/fer_coded_3d_detailed, unmatched(master)

. 
. gen detailedfer = _merge==3

. drop _merge

. 
. gen anynovinv = claim1_novinv~=&quot;&quot; 

. replace anynovinv = . if detailedfer==0

. 
. label var detailedfer   &quot;Has detailed coding of FER (2006-7 only)&quot;

. label var anynovinv &quot;Any novelty/inventive step objection?&quot;

. 
. /* Merge in information on grant lags for granted applications 2006-7 */
. 
. joinby national_number using tmp/fer_coded_3d_detailed_grants, unmatched(maste
&gt; r)

. assert _merge==3 if (detailedfer ==1 &amp; outcome==&quot;Grant&quot;)

. drop _merge

. 
. label var lag &quot;Grant lag in days(2006-7 granted applications only)&quot;

. 
. order pct national_number prioyear secondary outcome anyfer claim1_3d any3d de
&gt; tailedfer claim1_novinv  anynovinv lag

. 
. label define yn 0 &quot;no&quot; 1 &quot;yes&quot;

. label values any3d yn

. label values anynovinv yn

. 
. replace claim1_3d=&quot;-9&quot; if claim1_3d==&quot;&quot;

. replace claim1_novinv = &quot;-9&quot; if claim1_novinv==&quot;&quot; &amp; detailedfer==1

. 
. 
. replace claim1_3d = &quot;999&quot; if claim1_3d==&quot;X&quot;

. replace claim1_novinv = &quot;999&quot; if claim1_novinv==&quot;X&quot;

. 
. 
. destring claim1_3d claim1_novinv, replace force

. 
. 
. tab claim1_3d claim1_novinv , row col missing

. 
. label define rejcat -9 &quot;none&quot; 0 &quot;other claim&quot; 1 &quot;claim 1&quot; 999 &quot;vague&quot; 

. label values claim1_3d rejcat

. label values claim1_novinv rejcat

. 
. label var claim1_3d &quot;3(d) objection type&quot;

. label var claim1_novinv &quot;Novelty/Inventive Step objection type&quot;

. 
. gen grant = outcome==&quot;Grant&quot;

. label define grantl 0 &quot;not granted&quot; 1 &quot;granted&quot;

. label values grant grantl

. 
. label define secondaryl 0 &quot;Primary&quot; 1 &quot;Secondary&quot;, replace

. label values secondary secondaryl

. 
. gen year = real(substr(pct,3,4))

. 
. label var year &quot;Application Year&quot;

. 
. save processed/india3d, replace

. codebook

. 
. outsheet using processed/india3d.csv, names comma replace

.  
. 
</code></pre>
<h1><a href="#analysis" id="analysis">Analysis</a></h1>
<p>Below we reproduce the full code run using the <code>india3d</code> file, to generate the graphs in the text, and also any results related to the graph (e.g. statistical tests) that are noted in the text.</p>
<h2><a href="#figure-1" id="figure-1">Figure 1</a></h2>
<pre><code>. use processed/india3d

. 
. 
. * Discussion in text: Share of applications with FER over time
. 
. tab anyfer if year&gt;=2001 &amp; year&lt;=2004

    Has FER |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         74       21.89       21.89
          1 |        264       78.11      100.00
------------+-----------------------------------
      Total |        338      100.00

. tab anyfer if year&gt;=2008 &amp; year&lt;=2011

    Has FER |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        377       47.66       47.66
          1 |        414       52.34      100.00
------------+-----------------------------------
      Total |        791      100.00

. 
. * Discussion in text: How many FER could we find
. count if anyfer==1
  1,283

. 
. * Discussion in text: Share of applications with FER by status
. tab outcome anyfer, row column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
+-------------------+

           |        Has FER
   outcome |         0          1 |     Total
-----------+----------------------+----------
   Abandon |        18        578 |       596 
           |      3.02      96.98 |    100.00 
           |      3.16      45.05 |     32.16 
-----------+----------------------+----------
     Grant |        32        472 |       504 
           |      6.35      93.65 |    100.00 
           |      5.61      36.79 |     27.20 
-----------+----------------------+----------
   Pending |       315        162 |       477 
           |     66.04      33.96 |    100.00 
           |     55.26      12.63 |     25.74 
-----------+----------------------+----------
   Refused |         2         64 |        66 
           |      3.03      96.97 |    100.00 
           |      0.35       4.99 |      3.56 
-----------+----------------------+----------
 Withdrawn |       203          7 |       210 
           |     96.67       3.33 |    100.00 
           |     35.61       0.55 |     11.33 
-----------+----------------------+----------
     Total |       570      1,283 |     1,853 
           |     30.76      69.24 |    100.00 
           |    100.00     100.00 |    100.00 


. 
. * Discussion in text: Share with 3d and FER that have ambiguous 3(d)
. 
. tab claim1_3d  if year&gt;=2001 &amp; year&lt;=2004  &amp; anyfer==1 &amp; any3d ==1

       3(d) |
  objection |
       type |      Freq.     Percent        Cum.
------------+-----------------------------------
other claim |         15       14.02       14.02
    claim 1 |         50       46.73       60.75
      vague |         42       39.25      100.00
------------+-----------------------------------
      Total |        107      100.00

. tab claim1_3d  if year&gt;=2008 &amp; year&lt;=2011  &amp; anyfer==1 &amp; any3d ==1

       3(d) |
  objection |
       type |      Freq.     Percent        Cum.
------------+-----------------------------------
other claim |         14        4.73        4.73
    claim 1 |        241       81.42       86.15
      vague |         41       13.85      100.00
------------+-----------------------------------
      Total |        296      100.00

. 
. 
. /* Focus analysis on applications with FER */
. 
. keep if anyfer==1
(570 observations deleted)

. 
. /* Create indicators for general and specific 3(d) objections */
. 
. gen rejected3d = any3d==1

. gen rejected3d1 = claim1_3d==1

. 
. /* Generate the graph */
. 
. collapse (mean) rejected3d rejected3d1, by(year)

. label var rejected3d &quot;3(d) objection&quot;

. label var rejected3d1 &quot;3(d) objection on claim 1&quot;

. keep if year&gt;=2002 &amp; year&lt;=2010
(2 observations deleted)

. twoway (line rejected3d rejected3d1 year), scheme(tufte) 

</code></pre>
<img src="figure1.png" height="400" alt=" " >
<img src="figure1.eps" height="400" alt=" " >
<h2><a href="#figure-2" id="figure-2">Figure 2</a></h2>
<pre><code>. use processed/india3d,clear

. keep if year&gt;=2006 &amp; year&lt;=2007
(1,358 observations deleted)

. keep if anyfer==1
(68 observations deleted)

. assert detailedfer==1

. 
. * Discussion in text
. 
. tab any3d anynovinv, row column cell

+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
|  cell percentage  |
+-------------------+

           | Any novelty/inventive
    Any 3d |    step objection?
objection? |        no        yes |     Total
-----------+----------------------+----------
        no |        16        113 |       129 
           |     12.40      87.60 |    100.00 
           |     59.26      28.25 |     30.21 
           |      3.75      26.46 |     30.21 
-----------+----------------------+----------
       yes |        11        287 |       298 
           |      3.69      96.31 |    100.00 
           |     40.74      71.75 |     69.79 
           |      2.58      67.21 |     69.79 
-----------+----------------------+----------
     Total |        27        400 |       427 
           |      6.32      93.68 |    100.00 
           |    100.00     100.00 |    100.00 
           |      6.32      93.68 |    100.00 


. 
. /* Generate the graph */
. 
. tabplot any3d anynovinv , scheme(tufte) showval 

. 
</code></pre>
<img src="figure2.png" height="400" alt=" " >
<img src="figure2.eps" height="400" alt=" " >
<h2><a href="#figure-3" id="figure-3">Figure 3</a></h2>
<pre><code>. 
. use processed/india3d,clear

. keep if detailedfer==1
(1,426 observations deleted)

. 
. * In text discussion
. tab claim1_3d claim1_novinv , row column cell

+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
|  cell percentage  |
+-------------------+

       3(d) |
  objection |    Novelty/Inventive Step objection type
       type |      none  other cla    claim 1      vague |     Total
------------+--------------------------------------------+----------
       none |        16          4         88         21 |       129 
            |     12.40       3.10      68.22      16.28 |    100.00 
            |     59.26      44.44      26.83      33.33 |     30.21 
            |      3.75       0.94      20.61       4.92 |     30.21 
------------+--------------------------------------------+----------
other claim |         1          2         31          1 |        35 
            |      2.86       5.71      88.57       2.86 |    100.00 
            |      3.70      22.22       9.45       1.59 |      8.20 
            |      0.23       0.47       7.26       0.23 |      8.20 
------------+--------------------------------------------+----------
    claim 1 |         9          2        182         15 |       208 
            |      4.33       0.96      87.50       7.21 |    100.00 
            |     33.33      22.22      55.49      23.81 |     48.71 
            |      2.11       0.47      42.62       3.51 |     48.71 
------------+--------------------------------------------+----------
      vague |         1          1         27         26 |        55 
            |      1.82       1.82      49.09      47.27 |    100.00 
            |      3.70      11.11       8.23      41.27 |     12.88 
            |      0.23       0.23       6.32       6.09 |     12.88 
------------+--------------------------------------------+----------
      Total |        27          9        328         63 |       427 
            |      6.32       2.11      76.81      14.75 |    100.00 
            |    100.00     100.00     100.00     100.00 |    100.00 
            |      6.32       2.11      76.81      14.75 |    100.00 


. 
. /* Generate the graph */
. 
. tabplot claim1_3d claim1_novinv , scheme(tufte) showval  

. 
</code></pre>
<img src="figure3.png" height="400" alt=" " >
<img src="figure3.eps" height="400" alt=" " >
<h2><a href="#figure-4" id="figure-4">Figure 4</a></h2>
<pre><code>. 
. use processed/india3d,clear

. keep if detailedfer==1
(1,426 observations deleted)

. 
. /* Generate categories for nov/inv plus 3(d) vs. none*/
. 
. gen cat1 = .
(427 missing values generated)

. replace cat1 = 0 if any3d==0 &amp; anynovinv==0
(16 real changes made)

. replace cat1 = 1 if any3d==1 &amp; anynovinv==1
(287 real changes made)

. 
. * In text discussion of grant rates by categories
. 
. tab cat1, sum(grant)

            |          Summary of grant
       cat1 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          0 |        .625          .5          16
          1 |   .24041812   .42808383         287
------------+------------------------------------
      Total |   .26072607   .43975701         303

. ttest grant, by(cat1)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |      16        .625        .125          .5    .3585688    .8914312
       1 |     287    .2404181     .025269    .4280838    .1906813    .2901549
---------+--------------------------------------------------------------------
combined |     303    .2607261    .0252634     .439757    .2110115    .3104406
---------+--------------------------------------------------------------------
    diff |            .3845819    .1109571                 .166232    .6029318
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   3.4660
Ho: diff = 0                                     degrees of freedom =      301

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 0.9997         Pr(|T| &gt; |t|) = 0.0006          Pr(T &gt; t) = 0.0003

. 
. /* Generate categories for nov/inv alone vs. with 3(d) */
. 
. gen cat2 = .
(427 missing values generated)

. replace cat2 = 0 if any3d==0 &amp; anynovinv==1
(113 real changes made)

. replace cat2 = 1 if any3d==1 &amp; anynovinv==1
(287 real changes made)

. 
. * In text discussion of grant rates by categories
. 
. tab cat2, sum(grant)

            |          Summary of grant
       cat2 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          0 |   .33628319    .4745415         113
          1 |   .24041812   .42808383         287
------------+------------------------------------
      Total |       .2675    .4432097         400

. ttest grant, by(cat2)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |     113    .3362832    .0446411    .4745415    .2478326    .4247338
       1 |     287    .2404181     .025269    .4280838    .1906813    .2901549
---------+--------------------------------------------------------------------
combined |     400       .2675    .0221605    .4432097    .2239341    .3110659
---------+--------------------------------------------------------------------
    diff |            .0958651     .049049               -.0005624    .1922925
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   1.9545
Ho: diff = 0                                     degrees of freedom =      398

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 0.9743         Pr(|T| &gt; |t|) = 0.0513          Pr(T &gt; t) = 0.0257

. 
. /* Collapse and graph */
. 
. gen count = 1

. collapse (mean) grant (sum) count, by(any3d anynovinv)

. 
. decode any3d, gen(any3dl)

. decode anynovinv, gen(anynovinvl)

. 
. gen cat = &quot;3(d): &quot; + any3dl + &quot;; &quot; + &quot;nov/inv: &quot; + anynovinvl + &quot;  (N=&quot; + stri
&gt; ng(count) +&quot;)&quot;

. 
. graph hbar (asis) grant, over(cat, sort(count) descending) blabel(bar, format(
&gt; %3.2g)) ytitle(grant rate) scheme(tufte) 

. 
</code></pre>
<img src="figure4.png" height="400" alt=" " >
<img src="figure4.eps" height="400" alt=" " >
<h2><a href="#figure-5" id="figure-5">Figure 5</a></h2>
<pre><code>. 
. use processed/india3d,clear

. keep if detailedfer==1
(1,426 observations deleted)

. 
. /* Generate categories for only nov/inv on claim 1 vs. also 3d on claim 1 */
. 
. gen cat1=.
(427 missing values generated)

. replace cat1=0 if claim1_3d==-9 &amp; claim1_novinv==1
(88 real changes made)

. replace cat1=1 if claim1_3d==1 &amp; claim1_novinv==1
(182 real changes made)

. 
. * In text discussion of differences betewen categories
. tab cat1, sum(grant)

            |          Summary of grant
       cat1 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          0 |   .36363636   .48380242          88
          1 |   .19230769    .3952007         182
------------+------------------------------------
      Total |   .24814815   .43274036         270

. ttest grant, by(cat1)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |      88    .3636364    .0515735    .4838024    .2611284    .4661443
       1 |     182    .1923077    .0292942    .3952007    .1345056    .2501098
---------+--------------------------------------------------------------------
combined |     270    .2481481    .0263357    .4327404    .1962978    .2999985
---------+--------------------------------------------------------------------
    diff |            .1713287    .0553098                .0624316    .2802257
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   3.0976
Ho: diff = 0                                     degrees of freedom =      268

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 0.9989         Pr(|T| &gt; |t|) = 0.0022          Pr(T &gt; t) = 0.0011

. 
. /* Collapse and graph */
. 
. gen count = 1

. 
. collapse (mean) grant (sum) count, by(claim1_3d claim1_novinv) 

. 
. decode claim1_3d, gen(claim1_3dl)

. decode claim1_novinv, gen(claim1_novinvl)

. 
. gen cat = &quot;3(d):  &quot; + claim1_3dl + &quot;; &quot; + &quot;nov/inv:  &quot; + claim1_novinvl + &quot;  (
&gt; N=&quot; + string(count) +&quot;)&quot;

. 
. graph hbar (asis) grant, over(cat, sort(count) descending) blabel(bar, format(
&gt; %3.2g)) ytitle(grant rate) scheme(tufte) 

. 
. 
</code></pre>
<img src="figure5.png" height="400" alt=" " >
<img src="figure5.eps" height="400" alt=" " >
## Figure 6
<pre><code>. 
. use processed/india3d,clear

. keep if detailedfer==1 &amp; grant==1
(1,732 observations deleted)

. 
. * In text discussion of lags for applications with and without 3(d) objections
. 
. ttest lag, by(any3d)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
      no |      48    606.1042    41.45269    287.1927    522.7121    689.4962
     yes |      73     805.726     58.3943    498.9211    689.3191    922.1329
---------+--------------------------------------------------------------------
combined |     121    726.5372    39.75902    437.3492    647.8171    805.2573
---------+--------------------------------------------------------------------
    diff |           -199.6219    79.53439               -357.1079   -42.13583
------------------------------------------------------------------------------
    diff = mean(no) - mean(yes)                                   t =  -2.5099
Ho: diff = 0                                     degrees of freedom =      119

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 0.0067         Pr(|T| &gt; |t|) = 0.0134          Pr(T &gt; t) = 0.9933

. 
. /* Generate categories for nov/inv with and without 3(d) */
. 
. gen cat1=.
(121 missing values generated)

. replace cat1=1 if any3d==1 &amp; anynovinv==1
(69 real changes made)

. replace cat1=0 if any3d==0 &amp; anynovinv==1
(38 real changes made)

. 
. * In text discussion of lags by novelty/inventive step only and novelty/invent
&gt; ive step plus 3(d) 
. 
. ttest lag, by(cat1)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |      38    634.1316    50.13371    309.0449     532.551    735.7121
       1 |      69    812.7101    61.15383    507.9818    690.6796    934.7407
---------+--------------------------------------------------------------------
combined |     107    749.2897    43.90398    454.1463    662.2458    836.3336
---------+--------------------------------------------------------------------
    diff |           -178.5786    90.51607               -358.0552    .8980767
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =  -1.9729
Ho: diff = 0                                     degrees of freedom =      105

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 0.0256         Pr(|T| &gt; |t|) = 0.0511          Pr(T &gt; t) = 0.9744

. 
. /* Collapse and graph */
. 
. gen count = 1

. collapse (mean) lag (sum) count, by(any3d anynovinv)

. 
. decode any3d, gen(any3dl)

. decode anynovinv, gen(anynovinvl)

. 
. gen cat = &quot;3(d):&quot; + any3dl + &quot;; &quot; + &quot;nov/inv:&quot; + anynovinvl + &quot; (N=&quot; + string(
&gt; count) +&quot;)&quot;

. 
. graph hbar (asis) lag, over(cat, sort(count) descending) blabel(bar, format(%4
&gt; .1f)) ytitle(days) scheme(tufte) 

. 
</code></pre>
<img src="figure6.png" height="400" alt=" " >
<img src="figure6.eps" height="400" alt=" " >
<h2><a href="#figure-7" id="figure-7">Figure 7</a></h2>
<pre><code>. 
. use processed/india3d,clear

. keep if anyfer==1 
(570 observations deleted)

. 
. /* Generate a variable for claim 1 specific 3d objection */
. 
. gen claim1_3d_specific = claim1_3d == 1

. 
. * In text discussion of primary vs. secondary
. 
. tab secondary

  Secondary |
    (1=yes) |      Freq.     Percent        Cum.
------------+-----------------------------------
    Primary |        715       55.73       55.73
  Secondary |        568       44.27      100.00
------------+-----------------------------------
      Total |      1,283      100.00

. 
. 
. * In text discussion of primary vs. secondary vs. 3(d)
. 
. tab secondary claim1_3d

 Secondary |             3(d) objection type
   (1=yes) |      none  other cla    claim 1      vague |     Total
-----------+--------------------------------------------+----------
   Primary |       202         30        387         96 |       715 
 Secondary |       277         41        172         78 |       568 
-----------+--------------------------------------------+----------
     Total |       479         71        559        174 |     1,283 


. 
. * In text discussion of claim1 specific vs. secondary
. 
. tab secondary, sum(claim1_3d_specific)

  Secondary |    Summary of claim1_3d_specific
    (1=yes) |        Mean   Std. Dev.       Freq.
------------+------------------------------------
    Primary |   .54125874   .49864363         715
  Secondary |    .3028169   .45988169         568
------------+------------------------------------
      Total |   .43569758   .49604131       1,283

. ttest claim1_3d_specific, by(secondary)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
 Primary |     715    .5412587    .0186482    .4986436    .5046469    .5778706
Secondar |     568    .3028169    .0192962    .4598817    .2649161    .3407177
---------+--------------------------------------------------------------------
combined |   1,283    .4356976    .0138486    .4960413    .4085293    .4628659
---------+--------------------------------------------------------------------
    diff |            .2384418    .0270843                .1853074    .2915763
------------------------------------------------------------------------------
    diff = mean(Primary) - mean(Secondar)                         t =   8.8037
Ho: diff = 0                                     degrees of freedom =     1281

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 1.0000         Pr(|T| &gt; |t|) = 0.0000          Pr(T &gt; t) = 0.0000

. 
. /* Generate the figure */
. 
. tabplot claim1_3d secondary , scheme(tufte) showval 

. 
. * In text discussion of any 3(d) vs. secondary
. 
. tab secondary any3d, row column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
+-------------------+

 Secondary |   Any 3d objection?
   (1=yes) |        no        yes |     Total
-----------+----------------------+----------
   Primary |       202        513 |       715 
           |     28.25      71.75 |    100.00 
           |     42.17      63.81 |     55.73 
-----------+----------------------+----------
 Secondary |       277        291 |       568 
           |     48.77      51.23 |    100.00 
           |     57.83      36.19 |     44.27 
-----------+----------------------+----------
     Total |       479        804 |     1,283 
           |     37.33      62.67 |    100.00 
           |    100.00     100.00 |    100.00 


. ttest any3d, by(secondary)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
 Primary |     715    .7174825    .0168492    .4505388    .6844026    .7505624
Secondar |     568    .5123239    .0209916    .5002887    .4710931    .5535548
---------+--------------------------------------------------------------------
combined |   1,283    .6266563    .0135091    .4838809     .600154    .6531586
---------+--------------------------------------------------------------------
    diff |            .2051586    .0265972                .1529798    .2573374
------------------------------------------------------------------------------
    diff = mean(Primary) - mean(Secondar)                         t =   7.7136
Ho: diff = 0                                     degrees of freedom =     1281

    Ha: diff &lt; 0                 Ha: diff != 0                 Ha: diff &gt; 0
 Pr(T &lt; t) = 1.0000         Pr(|T| &gt; |t|) = 0.0000          Pr(T &gt; t) = 0.0000

.  
. 
</code></pre>
<img src="figure7.png" height="400" alt=" " >
<img src="figure7.eps" height="400" alt=" " >
<h2><a href="#figure-8" id="figure-8">Figure 8</a></h2>
<pre><code>. 
. use processed/india3d,clear

. keep if anyfer==1 
(570 observations deleted)

. 
. gen claim1_3d_specific = claim1_3d == 1

. 
. * Discussion in text: 3d for most recent applications
. 
. tab any3d if year &gt;=2009 &amp; year &lt;=2011

     Any 3d |
 objection? |      Freq.     Percent        Cum.
------------+-----------------------------------
         no |         54       22.98       22.98
        yes |        181       77.02      100.00
------------+-----------------------------------
      Total |        235      100.00

. tab claim1_3d_specific  if year &gt;=2009 &amp; year &lt;=2011

claim1_3d_s |
    pecific |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         83       35.32       35.32
          1 |        152       64.68      100.00
------------+-----------------------------------
      Total |        235      100.00

. 
. gen rejected3d1 = claim1_3d==1

. 
. /* Collapse and graph */
. 
. collapse (mean) any3d rejected3d1, by(year secondary)

. label var any3d &quot;3(d) objection&quot;

. label var rejected3d1 &quot;3(d) objection on claim 1&quot;

. label var year &quot;Application Year&quot;

. 
. keep if year&gt;=2002 &amp; year&lt;=2010
(3 observations deleted)

. 
. label define secondaryl 0 &quot;Primary&quot; 1 &quot;Secondary&quot;, replace

. label values secondary secondaryl

. 
. 
. twoway (line any3d rejected3d1 year), scheme(tufte) by(secondary)  

. 
</code></pre>
<img src="figure8.png" height="400" alt=" " >
<img src="figure8.eps" height="400" alt=" " >
<h2><a href="#clean-up" id="clean-up">Clean up</a></h2>
<p>This will delete intermediate files to save space.</p>
<pre><code>. foreach x in TPF_Core TPF_IPC TPF_PCT TPF_extract_pharma_india family_to_ipc f
&gt; er_coded_3d fer_coded_3d_detailed fer_coded_3d_detailed_grant finalextract-cod
&gt; ed finalextract ipo-outcomes fer_coded_3d_detailed_grants TPF_extract TPF_extr
&gt; act_pharma {

</code></pre>
<h2><a href="#generate-a-pdf-version-of-the-code" id="generate-a-pdf-version-of-the-code">Generate a PDF version of the code</a></h2>
<p>If you have Pandoc installed, the following command should generate the current PDF file which is just a slightly better formatted version of the Markdown file: <code>pandoc &quot;code/Data Appendix.md&quot; --listings -H code/listings-setup.tex -o &quot;Data Appendix.pdf&quot;</code></p>
<h1><a href="#references" id="references">References</a></h1>
<p>Hemphill, C. Scott, and Bhaven N. Sampat. &ldquo;Evergreening, patent challenges, and effective market life in pharmaceuticals.&rdquo; <em>Journal of Health Economics</em> 31.2 (2012): 327-339.</p>
<p>Hemphill, C. Scott, and Bhaven N. Sampat. &ldquo;When do generics challenge drug patents?.&rdquo; <em>Journal of Empirical Legal Studies</em> 8.4 (2011): 613-649.</p>
<p>Sampat, Bhaven N., and Kenneth C. Shadlen. &ldquo;Secondary pharmaceutical patenting: A global perspective.&rdquo; <em>Research Policy</em> 46.3 (2017): 693-707.</p>
